ITRI - 01 - 10 Generative lexicon meets corpus data : the case of non - standard word uses

نویسنده

  • Adam Kilgarriff
چکیده

There are various ways to evaluate the Generative Lexicon (GL). One is to see to what extent it accounts for what we find in text corpora. This has not previously been done, and this chapter presents a first foray. The experiment looks at the “nonstandard” uses of words found in a sample of corpus data: “nonstandard” is defined as not matching a literal reading of any of the word’s dictionary definitions. For each nonstandard instance we asked whether it could be analysed using GL strategies. Most cases could not. The chapter discusses in detail a number of non-standard uses and presents a model for their interpretation which draws on large quantities of knowledge about how the word has been used in the past. The knowledge is frequently indeterminate between ‘lexical’ and ‘general’, and is usually triggered by collocations rather than a single word in isolation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

When Embodiment Meets Generative Lexicon: The Human Body Part Metaphors in Sinica Corpus

This research aims to integrate embodiment with generative lexicon. By analyzing the metaphorically used human body part terms in Sinica Corpus, the first balanced modern Chinese corpus, we reveal how these two theories complement each other. Embodiment strengthens generative lexicon by spelling out the cognitive reasons which underlies the production of meaning, and generative lexicon, specifi...

متن کامل

When GL meets the corpus: a data-driven investigation of lexical types and coercion phenomena

In this paper we present an analysis of corpus-derived V-arg combinations aiming to provide a datadriven characterization of Lexical Types (LTs) and represent how types behave compositionally, i.e. how they enter compositional processes and are modulated by them. We will do so using the enriched compositional rules and the type system as presented in Pustejovsky (2006). Our main concerns are tw...

متن کامل

ITRI-98-01 Bridging the gap between lexicon and corpus: convergence of formalisms

I first consider the spectrum of lexical information from the semantic to the textual. A range of lexicons are classified according towhere they sit on this scale. Lexicographic tools and WSD programs are included in the classification, and this is justified. There is currently a lacuna between the most text-oriented of the lexicographic approaches, and the most sophisticated of the data-driven...

متن کامل

Improvements to Korektor: A Case Study with Native and Non-Native Czech

We present recent developments of Korektor, a statistical spell checking system. In addition to lexicon, Korektor uses language models to find real-word errors, detectable only in context. The models and error probabilities, learned from error corpora, are also used to suggest the most likely corrections. Korektor was originally trained on a small error corpus and used language models extracted...

متن کامل

ITRI - 99 - 07 Duplication in Corpora

We investigate duplication, a pervasive problem in NLP corpora. We present a method for finding it that uses word frequency list comparisons and experiment with this method on different units of duplication.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001